Multi-core and many-core shared-memory parallel raycasting volume rendering optimization and tuning
نویسندگان
چکیده
منابع مشابه
Parallel Packet Processing on Multi-core and Many- core Processors
The Service-oriented Router (SoR), a highly functional router based on a novel router architecture, enables unprecedented web services traditional routers were unable to provide. The SoR performs Deep Packet Inspection (DPI) to analyze Layer 7 information, which is becoming increasingly difficult due to the substantial increase in Internet traffic. Meanwhile, multi-core processors and general-p...
متن کاملShared Memory in the Many-Core Age
With the evolution toward fast networks of many-core processors, the design assumptions at the basis of software-level distributed shared memory (DSM) systems change considerably. But efficient DSMs are needed because they can significantly simplify the implementation of complex distributed algorithms. This paper discusses implications of the many-core evolution and derives a set of reusable el...
متن کاملTuning HipGISAXS on Multi and Many Core Supercomputers
With the continual development of multi and manycore architectures, there is a constant need for architecturespecific tuning of application-codes in order to realize high computational performance and energy efficiency, closer to the theoretical peaks of these architectures. In this paper, we present optimization and tuning of HipGISAXS, a parallel X-ray scattering simulation code [1], on vario...
متن کاملThe Glasgow Parallel Reduction Machine: Programming Shared-memory Many-core Systems using Parallel Task Composition
We present the Glasgow Parallel Reduction Machine (GPRM), a novel, flexible framework for parallel task-composition based many-core programming. We allow the programmer to structure programs into task code, written as C++ classes, and communication code, written in a restricted subset of C++ with functional semantics and parallel evaluation. In this paper we discuss the GPRM, the virtual machin...
متن کاملZNN - Fast and Scalable Algorithm for 3D ConvNets on Multi-Core and Many-Core Shared Memory Machines
Convolutional networks (ConvNets) have become a popular approach to computer vision. It is important to accelerate ConvNet training, which is computationally costly. We propose a novel parallel algorithm based on decomposition into a set of tasks, most of which are convolutions or FFTs. Applying Brent’s theorem to the task dependency graph implies that linear speedup with the number of processo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: The International Journal of High Performance Computing Applications
سال: 2012
ISSN: 1094-3420,1741-2846
DOI: 10.1177/1094342012440466